148 research outputs found

    Building a Generation Knowledge Source using Internet-Accessible Newswire

    Full text link
    In this paper, we describe a method for automatic creation of a knowledge source for text generation using information extraction over the Internet. We present a prototype system called PROFILE which uses a client-server architecture to extract noun-phrase descriptions of entities such as people, places, and organizations. The system serves two purposes: as an information extraction tool, it allows users to search for textual descriptions of entities; as a utility to generate functional descriptions (FD), it is used in a functional-unification based generation system. We present an evaluation of the approach and its applications to natural language generation and summarization.Comment: 8 pages, uses eps

    Gathering Statistics to Aspectually Classify Sentences with a Genetic Algorithm

    Full text link
    This paper presents a method for large corpus analysis to semantically classify an entire clause. In particular, we use cooccurrence statistics among similar clauses to determine the aspectual class of an input clause. The process examines linguistic features of clauses that are relevant to aspectual classification. A genetic algorithm determines what combinations of linguistic features to use for this task.Comment: postscript, 9 pages, Proceedings of the Second International Conference on New Methods in Language Processing, Oflazer and Somers ed

    Paraphrasing Using Given and New Information in a Question-Answer System

    Get PDF
    The design and implementation of a paraphrase component for a natural language question-answer system (CO-OP) is presented. A major point made is the role of given and new information in formulating a paraphrase that differs in a meaningful way from the user\u27s question. A description is also given of the transformational grammar used by the paraphraser to generate questions

    Using the Annotated Bibliography as a Resource for Indicative Summarization

    Get PDF
    We report on a language resource consisting of 2000 annotated bibliography entries, which is being analyzed as part of our research on indicative document summarization. We show how annotated bibliographies cover certain aspects of summarization that have not been well-covered by other summary corpora, and motivate why they constitute an important form to study for information retrieval. We detail our methodology for collecting the corpus, and overview our document feature markup that we introduced to facilitate summary analysis. We present the characteristics of the corpus, methods of collection, and show its use in finding the distribution of types of information included in indicative summaries and their relative ordering within the summaries.Comment: 8 pages, 3 figure

    Coordinating Text and Graphics in Explanation Generation

    Get PDF
    To generate multimedia explanations, a system must be able to coordinate the use of different media in a single explanation. In this paper, we present an architecture that we have developed for COMET (COordinated Multimedia Explanation Testbed), a system that generates directions for equipment maintenance and repair, and we show how it addresses the coordination problem. In particular, we focus on the use of a single content planner that produces a common content description used by multiple media-specific generators, a media coordinator that makes a f'me-grained division of information between media, and bidirectional interaction between media-specific generators to allow influence across media.

    Resources for Evaluation of Summarization Techniques

    Full text link
    We report on two corpora to be used in the evaluation of component systems for the tasks of (1) linear segmentation of text and (2) summary-directed sentence extraction. We present characteristics of the corpora, methods used in the collection of user judgments, and an overview of the application of the corpora to evaluating the component system. Finally, we discuss the problems and issues with construction of the test set which apply broadly to the construction of evaluation resources for language technologies.Comment: LaTeX source, 5 pages, US Letter, uses lrec98.st

    Prosody Modelling in Concept-to-Speech Generation: Methodological Issues

    Get PDF
    We explore three issues for the development of concept-to-speech (CTS) systems. We identify information available in a language-generation system that has the potential to impact prosody; investigate the role played by different corpora in CTS prosody modelling; and explore different methodologies for learning how linguistic features impact prosody. Our major focus is on the comparison of two machine learning methodologies: generalized rule induction and memory-based learning. We describe this work in the context of multimedia abstract generation of intensive care (MAGIC) data, a system that produces multimedia brings of the status of patients who have just undergone a bypass operation

    From Text to Speech Summarization

    Get PDF
    In this paper, we present approaches used in text summarization, showing how they can be adapted for speech summarization and where they fall short. Informal style and apparent lack of structure in speech mean that the typical approaches used for text summarization must be extended for use with speech. We illustrate how features derived from speech can help determine summary content within two ongoing summarization projects at Columbia University

    Generating multimedia briefings: coordinating language and illustration

    Get PDF
    AbstractCommunication can be more effective when several media (such as text, speech, or graphics) are integrated and coordinated to present information. This changes the nature of media-specific generation (e.g., language or graphics generation), which must take into account the multimedia context in which it occurs. This paper presents work on coordinating and integrating speech, text, static and animated three-dimensional graphics, and stored images, as part of several systems we have developed at Columbia University. A particular focus of our work has been on the generation of presentations that brief a user on information of interes
    • …
    corecore